12  Characterizing community diversity

Learning Objectives

After completing this lab you should be able to

You should have already downloaded the project directory for this module, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.

Once you have opened a project you should see the project name in the top right corner1.

  • 1 Pro tip: If you run into issues where a quarto document won’t render or file paths aren’t working (especially if things were working previously) one of your first steps should be to double check that the correct Rproj is loaded.

  • There should be a document named 12_community-diversity.qmd in your project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers.

    In this case, this chapter doubles as your skills challenge for the week.

    Before we start we need to install a package before we get started. vegan is a package that has a wide range of functions that implement standard

    install.packages("vegan")
    
    BiocManager::install("phyloseq")

    Now we can load the packages we need

    # load libraries
    library(phyloseq)
    library(vegan)
    
    library(knitr)
    library(tidyverse)

    12.1 Compile data sets

    We are going to import the data you will need for this chapter exploring how to characterize biological communities. We’ll start by loading to objects into your environment. One is the ASV table and the second is the corresponding taxonomy table from the fungi data set we were working on in the previous chapter.

    We are going to read in a data set that contains information about the soil plots from which the fungi eDNA was isolated.

    # load sample data
    soil <- read_delim("data/soil.csv", delim = ";")
    Give it a whirl

    Use your coding skills to take a look at the soil data set and the describe what information it contains (typical things you want to check are row, column numbers, column names, determining if it is a tidy data set, figuring out what information/variables are in the data set).

    Did it!

    [Your answer here]

    The package phyloseq that we installed has various utility functions to deal with metabarcoding data set. It has a specific object class that allows us to store the asv table, taxonomy table, and sample meta data in different slots. Various functions can then be used to pull information from those slots. We can load the object ps directly into your environment using the following code.

    # create phyloseq object
    load(file = "data/ps.rdata")

    12.2 Data filtering and transformation

    Before we can start exploring our data set we have a few steps to complete to transform it into containing the data we want in the format we want tit.

    Let’s start by taking a looking at how many taxa are currently present in the data set using phyloseq::ntaxa()

    ntaxa(ps)
    [1] 546

    We only want ASVs that were assigned as fungi in our reference database. We can use phyloseq::subset_taxa to filter by kingdom.

    ps_fungi <- subset_taxa(ps, Kingdom == "k__Fungi")
    Give it a whirl

    Figure out how many non-fungi groups were filtered.

    Did it!

    [Your answer here]

    Next, we want to remove any singletons or doubletons. These are ASVs that are only represented by one or two reads - we can assume these are artifacts.

    ps_fungi_nosd <- filter_taxa(ps_fungi, function(x) sum(x) > 2, TRUE)
    Give it a whirl

    Figure out how many taxonomic groups are still in the data set.

    Did it!

    [Your answer here]

    In many situations, metabarcoding eDNA samples is semi-quantitative2. The number of DNA templates in the environment are correlated to the number of individuals/biomass of a given taxonomic group. While you cannot necessarily back calculate the absolute abundance or number of specimen present in an environment you can use the proportion of reads assigned to a taxonomic group as a metric of relative abundance.

  • 2 We can always confidentally use metbarcoding data as qualitative data, i.e. as presence/absence data. Though even here we should be careful about whether “not detected” should be interpreted as absent.

  • One way to convert species abundance from absolute to relative is using a Hellinger transformation which standardizes the abundances to the sample totals and then square roots them. The function phyloseq::transform_sample_counts() allows us to apply a function to transform sample counts.

    ps_fungi_nosd_hel <- transform_sample_counts(ps_fungi_nosd, function(x) sqrt(x/sum(x)))
    Consider this

    Our data set could still contain multiple ASVs that have been assigned to the same species. Explain why this is an expected out come when using ASVs but would be rare/non-existant if you are using OTUs as the output of your bioinformatics pipeline.

    Did it!

    [Your answer here]

    To complete our data filtering and transformation we will use phyloseq::tax_glom() to collapse ASVs asigned to the same taxonomic group at the species level.

    ps_transf = tax_glom(ps_fungi_nosd_hel, "Species", NArm = FALSE)

    Let’s explore our final transformed data set

    Give it a whirl

    Apply the functions ntaxa(), nsamples(), rank_names(), and sample_variable() to our final transformed phyloseq object. Look up what each function does, make sure to comment/annotate your code and then briefly describe what you’ve learned about our data set.

    Did it!

    [Your answer here]

    Recall that our phyloseq object contains a slot that holds our ASVs table and our taxonomic table that we can access at as such.

    otu_table(ps_transf)[1:2, 1:2]
    OTU Table:          [2 taxa and 2 samples]
                         taxa are columns
        ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCTTTGGTATTCCGAAGGGCATGCCTGTTTGAGTGTCATTAAATTATCAACCTTGCTCGCTTTTATTAGCTTGAGTTAGGCTTGGATGTGAGGGTTTTGCTGGCTTCCTTCAGTGGATGGTCTGCTCCCTTTGAATGCATTAGCGGGATCTCTTGTGGACCGTCACTTGGTGTGATAATTATCTATGCCTTGAGACTTTGAAACAAACTTATGAGAATCTGCTTATAACCGTCCTCACGGACAACTTTTGACAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAA
    S10                                                                                                                                                                                                                                                                                                                                                            0.00000000
    S11                                                                                                                                                                                                                                                                                                                                                            0.08178608
        ATGCGATACGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCACTCCTTGGTATTCCGAGGAGTATGCCTGTTTCAGTATCATGAGCACTCTCACACCTAACCTTTGGGTTTATGGCGTGGAATTGGAATGCGCCGACTGTCATGGTTGGCCCTTCTAAAATGTAGTTCTTGGCTGTCACCTAATACAGCAGTTTGGCCTAATAGTTTTGGCATTCATTGTCAAATCTTTGGCTAACATTTGCTCCAGGAGTCAGTCTTGATAATACAGAAAACTCATTCAAATTTTGATCTGAAATCAGGTAGGGCTACCCGCTGAACTTAA
    S10                                                                                                                                                                                                                                                                                                                                               0.0000000
    S11                                                                                                                                                                                                                                                                                                                                               0.2085144
    tax_table(ps_transf)[1:2, 1:2]
    Taxonomy Table:     [2 taxa by 2 taxonomic ranks]:
                                                                                                                                                                                                                                                                                                                                                                          Kingdom   
    ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCTTTGGTATTCCGAAGGGCATGCCTGTTTGAGTGTCATTAAATTATCAACCTTGCTCGCTTTTATTAGCTTGAGTTAGGCTTGGATGTGAGGGTTTTGCTGGCTTCCTTCAGTGGATGGTCTGCTCCCTTTGAATGCATTAGCGGGATCTCTTGTGGACCGTCACTTGGTGTGATAATTATCTATGCCTTGAGACTTTGAAACAAACTTATGAGAATCTGCTTATAACCGTCCTCACGGACAACTTTTGACAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAA "k__Fungi"
    ATGCGATACGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCACTCCTTGGTATTCCGAGGAGTATGCCTGTTTCAGTATCATGAGCACTCTCACACCTAACCTTTGGGTTTATGGCGTGGAATTGGAATGCGCCGACTGTCATGGTTGGCCCTTCTAAAATGTAGTTCTTGGCTGTCACCTAATACAGCAGTTTGGCCTAATAGTTTTGGCATTCATTGTCAAATCTTTGGCTAACATTTGCTCCAGGAGTCAGTCTTGATAATACAGAAAACTCATTCAAATTTTGATCTGAAATCAGGTAGGGCTACCCGCTGAACTTAA               "k__Fungi"
                                                                                                                                                                                                                                                                                                                                                                          Phylum            
    ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCTTTGGTATTCCGAAGGGCATGCCTGTTTGAGTGTCATTAAATTATCAACCTTGCTCGCTTTTATTAGCTTGAGTTAGGCTTGGATGTGAGGGTTTTGCTGGCTTCCTTCAGTGGATGGTCTGCTCCCTTTGAATGCATTAGCGGGATCTCTTGTGGACCGTCACTTGGTGTGATAATTATCTATGCCTTGAGACTTTGAAACAAACTTATGAGAATCTGCTTATAACCGTCCTCACGGACAACTTTTGACAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAA "p__Basidiomycota"
    ATGCGATACGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCACTCCTTGGTATTCCGAGGAGTATGCCTGTTTCAGTATCATGAGCACTCTCACACCTAACCTTTGGGTTTATGGCGTGGAATTGGAATGCGCCGACTGTCATGGTTGGCCCTTCTAAAATGTAGTTCTTGGCTGTCACCTAATACAGCAGTTTGGCCTAATAGTTTTGGCATTCATTGTCAAATCTTTGGCTAACATTTGCTCCAGGAGTCAGTCTTGATAATACAGAAAACTCATTCAAATTTTGATCTGAAATCAGGTAGGGCTACCCGCTGAACTTAA               "p__Mucoromycota" 
    Consider this

    Explain how the notation [1:2, 1:2] modifies the output

    Did it!

    [Your answer here]

    12.3 Combine data sets

    We have three explanatory variables that could be driving differences in fungal communities among samples.

    • Soil samples were taken in differnt forest plots that were classified as dominated by Acer saccharum (AS) or Fagus grandifolia (FG) or mixed with other small trees and shrubs present (mixed).
    • Soil samples where taken from different soil horizons (depths): L, F, H, Ae, or B
    • Soil chemistry (carbon, nitrogen, pH)

    This information is stored in our soil dataframe.

    Now that we’ve filtered and transformed our data set, let’s pull it back out to create a dataframe as the object you are more familiar with in terms of being able to manipulate it.

    Give it a whirl

    Let’s pull out our taxonomic table and transform it into a dataframe. Comment the following code line by line to describe what each function/arguments is doing.

    asv_tax <- tax_table(ps_transf) %>%  #
      as.data.frame() %>%            #
      rownames_to_column("asv")      #
    
    # write out interim file
    write_delim(asv_tax, "results/asv_tax.txt", delim = "\t")
    Give it a whirl

    Now, let’s pull out our information on how many times each ASV is observed in a each sample. Comment the following code line by line to describe what each function/arguments is doing. You may need to look up some of the functions.

    asv_counts <- otu_table(ps_transf) %>%  #
      t() %>%                               #
      as.data.frame() %>%                   #
      rownames_to_column("asv")             #
    
    # write out interim file
    write_delim(asv_counts, "results/asv_counts.txt", delim = "\t")
    Give it a whirl

    Take a look at how your asv_tax and asv_count objects are now formatted and briefly describe it (remember, key things are number or rows, columns, what those columns are).

    Did it!

    [Your answer here]

    Give it a whirl

    Really, what we want is for our asv_counts table to contain the taxonomic information contained in the asv_tax table. Combine those two data sets into an object called tax_count. Remove the ASV sequence from the dataframe and arrange the remaining columns to first have all the taxonomic information, then the number of occurrences in each sample. Print the first few lines to the console when you are done.

    Did it!

    [Your answer here]

    This is what your result should look like

       Kingdom           Phylum                 Class             Order
    1 k__Fungi p__Basidiomycota     c__Agaricomycetes     o__Agaricales
    2 k__Fungi  p__Mucoromycota c__Umbelopsidomycetes o__Umbelopsidales
    3 k__Fungi    p__Ascomycota    c__Dothideomycetes  o__Mytilinidales
    4 k__Fungi p__Basidiomycota     c__Agaricomycetes     o__Agaricales
    5 k__Fungi    p__Ascomycota    c__Sordariomycetes     o__Xylariales
    6 k__Fungi    p__Ascomycota                  <NA>              <NA>
                    Family           Genus       Species       S10        S11
    1  f__Tricholomataceae       g__Mycena          <NA> 0.0000000 0.08178608
    2   f__Umbelopsidaceae   g__Umbelopsis   s__dimorpha 0.0000000 0.20851441
    3        f__Gloniaceae   g__Cenococcum  s__geophilum 0.0000000 0.32414749
    4    f__Hygrophoraceae    g__Hygrocybe s__flavescens 0.2929066 0.40028795
    5 f__Amphisphaeriaceae g__Polyscytalum s__algarvense 0.0000000 0.00000000
    6                 <NA>            <NA>          <NA> 0.5073291 0.22398041
            S12        S13       S14        S15       S16        S17       S18
    1 0.0000000 0.09578263 0.2379155 0.04867924 0.7712319 1.25129257 0.4761444
    2 0.2721655 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000
    3 0.5233063 0.13545709 0.0000000 0.23676137 0.2667853 0.00000000 0.0000000
    4 0.0000000 0.00000000 0.0000000 0.06884284 0.0000000 0.00000000 0.0000000
    5 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000 0.3165055
    6 0.1200137 0.19156526 0.0000000 0.13299415 0.2226355 0.07392213 0.5152174
              S1       S20        S21        S22        S23       S24       S25 S26
    1 0.07372098 0.1543370 0.15971703 0.04598005 0.17325923 0.1726902 0.2505837   0
    2 0.24818179 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 0.0000000   0
    3 0.36422877 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 0.0000000   0
    4 0.24077171 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 0.0000000   0
    5 0.00000000 0.6046415 0.06052275 0.34039602 0.00000000 0.4626177 0.3245927   0
    6 0.41216069 0.5922461 0.25094334 0.47371431 0.04331481 0.3527087 0.5436833   0
            S27        S28        S29        S2       S30       S31        S33
    1 0.9263177 0.04145133 0.27471034 0.1142577 0.4917893 0.4379003 0.07124705
    2 0.0000000 0.00000000 0.00000000 0.1842351 0.0000000 0.0000000 0.00000000
    3 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000 0.19991094
    4 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000 0.00000000
    5 0.0000000 0.13747852 0.04729838 0.0000000 0.3446360 0.3636152 0.00000000
    6 0.0000000 0.33819215 0.17651995 0.7626946 0.1953884 0.8801878 0.12162632
            S34        S35       S36        S37       S39         S3       S41
    1 0.0000000 0.04393748 0.0559017 0.09449112 0.1474420 0.06516352 0.1036952
    2 0.3815359 0.00000000 0.2091650 0.18898224 0.6158141 0.36572936 0.0000000
    3 0.0000000 0.77616588 0.1118034 0.33838581 0.2197935 0.00000000 0.0000000
    4 0.5209007 0.00000000 0.0000000 0.00000000 0.0695048 0.14497221 0.2375955
    5 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000 0.0000000
    6 0.0000000 0.00000000 0.0000000 0.29827034 0.4928534 0.00000000 1.1342749
             S42        S43       S44       S45        S46       S47       S48
    1 0.07808688 0.05123155 0.1952834 0.1954906 0.08441499 0.0000000 0.1773317
    2 0.19908326 0.00000000 0.0000000 0.0000000 0.09747404 0.0000000 0.0000000
    3 0.17460757 0.61471931 0.1301889 0.0000000 0.21037942 0.5733137 0.6634328
    4 0.68553927 0.00000000 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000
    5 0.00000000 0.00000000 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000
    6 0.33809195 0.33116596 0.0000000 0.1382327 0.10897929 0.3473466 0.0000000
             S49         S4       S50       S51        S52       S53        S54
    1 0.00000000 0.08962214 0.1162476 0.1076244 0.09853293 0.1624354 0.08980265
    2 0.00000000 0.00000000 0.0000000 0.0000000 0.00000000 0.0000000 0.15554275
    3 0.78598070 0.50232050 0.4739717 0.0000000 0.30722902 0.0000000 0.77815937
    4 0.00000000 0.00000000 0.0000000 0.6499631 1.03997108 0.3456639 0.00000000
    5 0.00000000 0.00000000 0.0000000 0.2107167 0.00000000 0.0000000 0.00000000
    6 0.09724333 0.00000000 0.0000000 0.2818244 0.19706586 0.7353688 0.27679036
             S55        S56       S58        S59         S5        S60        S61
    1 0.07722833 0.00000000 0.0000000 0.00000000 0.08436491 0.05852057 0.09072184
    2 0.00000000 0.35948681 0.2656845 0.00000000 0.00000000 0.00000000 0.06415003
    3 0.14788099 0.09607689 0.3510009 0.07800765 0.00000000 0.00000000 0.00000000
    4 0.16683226 0.13587324 0.0000000 0.00000000 0.00000000 1.19488153 0.57239218
    5 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000
    6 0.29652140 0.24485651 0.2425356 0.00000000 0.17896500 0.34818263 0.46770187
            S62        S63        S64        S65        S66        S67        S68
    1 0.1528942 0.00000000 0.04252433 0.09911197 0.04756515 0.00000000 0.08119979
    2 0.0000000 0.31596888 0.17533229 0.36009167 0.30082842 0.06356417 0.09376145
    3 0.1441500 0.04045567 0.00000000 0.00000000 0.11651035 0.00000000 0.38923215
    4 0.0000000 0.12793206 0.22443986 0.00000000 0.33580828 0.07784989 0.25246042
    5 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
    6 0.2573043 0.45294784 0.33911838 0.48952099 0.28824313 0.39950351 0.06629935
             S69        S6        S72        S73       S74      S75       S76
    1 0.12149135 0.0412393 0.06765101 0.08753762 0.1849937 0.000000 0.1237179
    2 0.06074567 0.0000000 0.31001587 0.27681827 0.0000000 0.000000 0.0000000
    3 0.00000000 0.1797580 0.22941573 0.00000000 0.1807754 1.006459 0.6441024
    4 0.22146362 0.8184629 0.00000000 0.00000000 0.2061156 0.000000 0.0000000
    5 0.00000000 0.0000000 0.00000000 0.27681827 0.0000000 0.000000 0.0000000
    6 0.33253266 0.0000000 0.23310641 0.53134923 0.2188566 0.134840 0.0000000
             S77        S78        S79         S7        S80       S81        S8
    1 0.31370944 0.11909827 0.24942152 0.04719292 0.16464639 0.0433555 0.0000000
    2 0.00000000 0.00000000 0.00000000 0.00000000 0.07761505 0.3386173 0.0000000
    3 0.19649437 0.56472318 0.08971226 0.23596459 0.52008893 0.0000000 0.7211103
    4 0.06213698 0.65223431 0.08971226 0.00000000 0.05488213 0.0000000 0.0000000
    5 0.19649437 0.00000000 0.00000000 0.00000000 0.05488213 0.0000000 0.0000000
    6 0.43896637 0.08421519 0.16353430 0.13348173 0.26499437 0.2410773 0.0000000
              S9
    1 0.08596024
    2 0.00000000
    3 0.26193862
    4 0.00000000
    5 0.00000000
    6 0.24814583
    Consider this

    Take a look at your resulting data set and determine if it is a tidy data set or not. Then describe how you would transform it into a tidy data set and explain why those changes make it fulfill all the conditions for it to be tidy.

    Spoiler alert: It’s not tidy … go ahead and create an object called tidy_counts that’s a tidy data frame.

    Did it!

    [Your answer here]

    This is what your result should look like (it’s probably clear why tidy data sets are also referred to as long data sets at this point …)

    # A tibble: 6 × 9
      Kingdom  Phylum           Class        Order Family Genus Species ID     count
      <chr>    <chr>            <chr>        <chr> <chr>  <chr> <chr>   <chr>  <dbl>
    1 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S10   0     
    2 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S11   0.0818
    3 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S12   0     
    4 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S13   0.0958
    5 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S14   0.238 
    6 k__Fungi p__Basidiomycota c__Agaricom… o__A… f__Tr… g__M… <NA>    S15   0.0487
    Give it a whirl

    Hm, if only our tidy_counts object also contained the sample meta-data currently in our dataframe soil. Go ahead and add that information to our tidy_counts data frame. Print the first few rows of your data frame to the console to make sure this worked.

    Did it!

    [Your answer here]

    This is what your result should look like (are we having fun yet?)… if you are having problems combining these data frames recall that you need one column in common. It’s easiest if those columns also share the same column name, but you can look up the function to see if there is a way around it if they don’t match up.

    # A tibble: 6 × 17
      Kingdom  Phylum   Class Order Family Genus Species ID     count sampleID plot 
      <chr>    <chr>    <chr> <chr> <chr>  <chr> <chr>   <chr>  <dbl> <chr>    <chr>
    1 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S10   0      M_FS_1/… M_FS…
    2 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S11   0.0818 M_FS_1/… M_FS…
    3 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S12   0      M_FG_FS… M_FG…
    4 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S13   0.0958 M_FS_1/… M_FS…
    5 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S14   0.238  M_AS_FS… M_AS…
    6 k__Fungi p__Basi… c__A… o__A… f__Tr… g__M… <NA>    S15   0.0487 M_FS_1/… M_FS…
    # ℹ 6 more variables: block <dbl>, forest <chr>, horizon <chr>, Carbon <dbl>,
    #   Nitrogen <dbl>, pH <dbl>

    12.4 Characterize community diversity

    Okay… now we’re ready to have some fun.

    First, let’s find out what taxonomic groups are present in the entire data set.

    Give it a whirl

    Print a series of tables to the console and in your html document3 that comprise a single column each that show all the phyla, classes, orders, famiies, genera, and species in the data set in alphabetical order, respectively. Note that the values in those columns have a prefix indicate the taxonomic level. Get rid of that in your output.

  • 3 use the function kable() to print the entire data frame in a pretty table

  • Did it!

    [Your answer here]

    Here’s what the table for phylum should look like

    kable(
    tidy_counts %>%
      distinct(Phylum) %>%
      separate(Phylum, into = c("tmp", "Phylum"), sep = "__") %>%
      arrange(Phylum)
    )
    Give it a whirl

    Now, let’s compare patterns across different forest plot types. Create separate tables and print them to the console/have them print neatly in your rendered html files for easy comparison of the mean relative abundance for phylum, order, and family for each forest plot type. Print the first four digits only.

    Describe your results.

    Did it!

    [Your answer here]

    Here’s what the table for phylum should look like

    kable(
    tidy_counts %>%
      group_by(forest, Phylum) %>%
      summarize(mean_rel_abund = mean(count)) %>%
      pivot_wider(names_from = "forest", values_from = "mean_rel_abund"),
    digits = 4
    )
    Give it a whirl

    Pick one forest plot type that you are interested in exploring further. Create tables that make it easy to compare the mean relative abundance for phylum, order, and family across the different soil horizons, and print those three table to the console and to your rendered html report.

    Describe your results.

    Did it!

    [Your answer here]

    Measures of diversity enable us to quantify the complexity of biological communities in a way that allows us to objectively compare communities across space and time. Measures of alpha diversity describe the diversity of a single sample and are based on the number of observed taxonomic groups or their relative abundance, beta diversity describes the variation between samples, gamma diversity describes the global diversity observed across a large number of communities.

    Common alpha diversity statistics include:

    • observed richness the number of observed taxa.
    • Shannon Index (Shannon-Weaver or Shannon-Wiener Index) quantifies how difficult it is to predict the identity of a randomly sampled individual. Ranges from 0 (total certainty) to log(S) (total uncertainty).
    • Simpson Index quantifies the probability that two randomly chosen individuals are the same taxonomic group (ranges from 0 to 1).
    • Inverse Simpson Index is defined as the number of species needed to have the same Simpson index value as the observed community assuming a theoretical community where all species are equally abundant.

    The Shannon and Simpson Diversity are entropy-based indices that measure the disorder (diversity) of a system. In information theory entropy is used to describe the fact that we can quantify the degree of uncertainty associated with with predicted pieces of information. Applied to ecology this means that when describing diversity using these indices we are determining whether or not individuals randomly drawn from a community are the same or different species (or other taxonomic group).

    The relationship between species richness and Shannon diversity is non-linear, i.e. at higher levels of species richness, communities appear more similar in terms of the magnitude of the index compared to lower levels of species richness - which is counter intuitive to the way species richness works. The solution to this is to linearize the indices. As a result, more recently, diversity indices have been proposed where diversity values are converted into equivalent (or effective numbers) of species (jost_entropy_2006?). The effective number of species is the number of species in a theoretical community where all species (taxonomic groups) are equally abundant that would produce the same observed value of diversity (a similar principle is applies in genetics for the concept of effective population sizes). While the definition of effective numbers is not as intuitive as the entropy-based ones the values that we yield are. Not only are the “units” species (instead of being a unitless index), but they have properties that we intuitively understand. For example, effective numbers obey the doubling principle: If you have two communities with equally abundant but totally distinct species and combine them, that new community would have a diversity that is 2x that of the original ones.

    The package vegan has several functions implemented that allow us to calculate these diversity stats. Look up any functions you are not familiar with in the following code and comment it to describe what each line of code is doing.

    diversity <- tidy_counts %>%
      group_by(ID, forest, horizon) %>%
      summarize(richness = specnumber(count),
                shannon = diversity(count, index = "shannon"),
                simpson = diversity(count, index = "simpson"),
                invsimpson = diversity(count, index = "invsimpson")) %>%
      pivot_longer(cols = c(richness, shannon, simpson, invsimpson),
                   names_to = "metric")

    We can create a series of plots that compare the different diversity stats for each forest plot type and soil horizon location4.

  • 4 Well, I can but you will be able to soon, too

  • ggplot(diversity, aes(x = forest, y = value, color = forest)) +
      geom_boxplot(aes(color = forest), outlier.shape = NA, fill = "transparent", size = 1) +
      geom_point(aes(fill = forest),
                 position = position_jitterdodge(jitter.width = .3),
                 shape = 21, color = "black", size = 3) +
      facet_grid(metric ~ horizon, scales = "free") +
      labs(x = " ", y = " ")

    Consider this

    Describe the results.

    Did it!

    [Your answer here]